Algorithmic Programming Language Identification

نویسندگان

  • David Klein
  • Kyle Murray
  • Simon Weber
چکیده

Motivated by the amount of code that goes unidentified on the web, we introduce a practical method for algorithmically identifying the programming language of source code. Our work is based on supervised learning and intelligent statistical features. We also explored, but abandoned, a grammatical approach. In testing, our implementation greatly outperforms that of an existing tool that relies on a Bayesian classifier. Code is written in Python and available under an MIT license.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel Genetic Algorithm Using Algorithmic Skeleton

Algorithmic skeleton has received attention as an efficient method of parallel programming in recent years. Using the method, the programmer can implement parallel programs easily. In this study, a set of efficient algorithmic skeletons is introduced for use in implementing parallel genetic algorithm (PGA).A performance modelis derived for each skeleton that makes the comparison of skeletons po...

متن کامل

SLAC - PUB - 6349 September 1993 ( T / E ) A Unified Treatment of Track Reconstruction and Particle Identification

In this note, the deformable template or elastic arm approach to track reconstruction described previously will be extended to include particle identification. The discussion will first develop the mathematical and algorithmic structure of the pattern recognition problem using the methodology and language of statistical mechanics and then sketch its implementation in an object oriented programm...

متن کامل

Parallel Genetic Algorithm Using Algorithmic Skeleton

Algorithmic skeleton has received attention as an efficient method of parallel programming in recent years. Using the method, the programmer can implement parallel programs easily. In this study, a set of efficient algorithmic skeletons is introduced for use in implementing parallel genetic algorithm (PGA).A performance modelis derived for each skeleton that makes the comparison of skeletons po...

متن کامل

Characterizing Language Identification by Standardizing Operations

Notions from formal language learning theory are characterized in terms of standardizing operations on classes of recursively enumerable languages. Algorithmic identification in the limit of grammars from text presentation of recursively enumerable languages is a central paradigm of language learning. A mapping, F, from the set of all grammars into the set of all grammars is a standardizing ope...

متن کامل

Towards the Classi cation of Algorithmic Skeletons

Algorithmic skeletons are seen as being high level parallel programming language constructs encapsulating the expression of parallelism communication synchronisa tion embedding and costing This report examines the classi cation of algorithmic skeletons proposing one classi cation and examining others which have been devised Various algorithmic skeletons are examined and these are categorised to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1106.4064  شماره 

صفحات  -

تاریخ انتشار 2011